Adapting machine translation models toward misrecognized speech with text-to-speech pronunciation rules and acoustic confusability
نویسندگان
چکیده
In the spoken language translation pipeline, machine translation systems that are trained solely on written bitexts are often unable to recover from speech recognition errors due to the mismatch in training data. We propose a novel technique to simulate the errors generated by an ASR system, using the ASR system’s pronunciation dictionary and language model. Lexical entries in the pronunciation dictionary are converted into phoneme sequences using a text-to-speech (TTS) analyzer and stored in a phoneme-to-word translation model. The translation model and ASR language model are combined into a phonemeto-word MT system that “damages” clean texts to look like ASR outputs based on acoustic confusions. Training texts are TTSconverted and damaged into synthetic ASR data for use as adaptation data for training a speech translation system. Our proposed technique yields consistent improvements in translation quality on English-French lectures.
منابع مشابه
Augmenting Translation Models with Simulated Acoustic Confusions for Improved Spoken Language Translation
We propose a novel technique for adapting text-based statistical machine translation to deal with input from automatic speech recognition in spoken language translation tasks. We simulate likely misrecognition errors using only a source language pronunciation dictionary and language model (i.e., without an acoustic model), and use these to augment the phrase table of a standard MT system. The a...
متن کاملCross-Word Arabic Pronunciation Variation Modeling Using Part of Speech Tagging
Speech recognition is often used as the front-end for many natural language processing (NLP) applications. Some of these applications include machine translation, information retrieval and extraction, voice dialing, call routing, speech synthesis/recognition, data entry, dictation, control, etc. Thus, much research work has been done to improve the speech recognition and the related NLP applica...
متن کاملPronunciation Modeling for Large Vocabulary Speech Recognition by Arthur
The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy for automatic speech recognition (ASR). Many pronunciation modeling approaches have been developed to address this problem. Some explicitly manipulate the pronunciation dictionary as well as the set of the units used to define the pronunciations of words. Others model the pronunciati...
متن کاملJoint pronunciation modelling of non-native speakers using data-driven methods
Modelling non-native speakers with different mother tongues is a difficult task for automatic speech recognition due to the large variation among speakers. One possibility for jointly modelling all speakers is to use the same speaker independent acoustic models and a joint lexicon to capture the variation. We have modified the reference lexicon using pronunciation rules that are derived in a to...
متن کاملDictionary learning: performance through consistency
We present rst results from our e orts in automatically increasing and adapting phonetic dictionaries for spontaneous speech recognition. Spontaneous speech adds a variety of phenomena to a speech recognition task: false starts [1], human and nonhuman noises [2], new words [3] and alternative pronunciations. All of these phenomena have to be tackled when adapting a speech recognition system for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015